This document describes fitting a hierarchical ordination to data, including code and the details that are needed.
The data is available on figshare, but was originally collected by Ribera et al. (2001), and reanalyzed by Niku et al. (2021). It includes abundance observations of 68 ground beetles 87 sites in Scotland, along with 17 environmental variables and 20 traits. The aim to is determine how the environmental variables and traits affect composition in the ecological community, including whether they interact. The methodological question is how does hierarchical ordination help.
The abundances, traits and environment are each stored in a different matrix. First we load the data and set some constants:
library(nimble)
library(nimbleHMC)
library(coda)
library(lattice)
# Response data
Y <- t(read.csv("Beetles/Y.csv"))
colnames(Y) <- Y[2,]
Y<-Y[-c(1:2),-c(1,70:71)]
Y <- as.data.frame(apply(Y,2,as.integer))
# Environmental predictors
X <- read.csv("Beetles/X.csv")[,-c(1:5)]
X <- as.data.frame(apply(X,2,as.numeric))
X$Sampling.year <- X$Sampling.year - min(X$Sampling.year)
X$Texture <- as.factor(X$Texture)
# Traits
TR <- read.csv("Beetles/TR.csv")
row.names(TR) <- TR$SPECIES
TR <- TR[,-c(1:3)]
# Traits to categorical
TR[,c("CLG","CLB","WIN","PRS","OVE","FOA","DAY","BRE","EME","ACT")] <- apply(TR[,c("CLG","CLB","WIN","PRS","OVE","FOA","DAY","BRE","EME","ACT")],2,as.factor)
# Data standardization
X <- scale(model.matrix(~.,X))[,-1] # environmental variables
TR <- scale(model.matrix(~.,TR))[,-1] # species traits
# Remove some categorical "traits" that only have one or two observations
TR <- TR[,!colnames(TR)%in%c("OVE2?","ACT2?")]
# Constants
NSites <- nrow(Y) # number of sites
NSpecies <- ncol(Y) # number of species
NTraits <- ncol(TR) # number of traits
NEnv <- ncol(X) # number of environmental predictors
# create data lists for Nimble
dat <- list(Y = Y, X = as.matrix(X), TR = as.matrix(TR))
consts <- list(NSites = NSites, NEnv = NEnv, NTraits = NTraits, NSpecies = NSpecies)
meanvar.plot(Y, ylab="Sample variances (log scale)", xlab="Sample means (log scale)")
abline(a = 0, b = 1, col = "blue")
See Warton (2008) for more information.
So, we that the counts for each species follow a negative binomial distribution with a log link function, as we would in a standard generalized linear model. We assume that each species has a different mean abundance (i.e. for each species \(j\) we have a different intercept \(\beta_{0j}\)), and model the rest of the variation with a hierarchical ordination. This gives the following mean model on the link scale (with linear predictor \(\eta_{ij}\)):
\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]
As with any ordination, \(\boldsymbol{z}_i\) and \(\boldsymbol{\gamma}_j\) are the site scores and species loadings, and the columns of \(\boldsymbol{Z}\) are the latent variables (which holds the site scores as the rows). Here we assume that each latent variable has a variance of one, so that \(\boldsymbol{\Sigma}\) holds the variation of the latent variables: it will typically be a diagonal matrix, and if any of the terms on the diagonal are close to zero, this suggests that latent variable has (almost) no effect. It thus provides a straightforward summary of the relative importance of that latent variable, and is similar to the square root of eigenvalues (singular values) in an eigenanalysis.
If we stopped here, this would be a standard Generalized Linear Latent Variable Model. But, here we want to model both \(\boldsymbol{z}_i\) and \(\boldsymbol{\gamma}_j\), i.e. go from modelling model groups of species responding in similar ways to sites to modelling how the traits of the species affect their responses to the environment.
As a simplification, we can think about this as simply a regression of \(\boldsymbol{z}_i\) (the site effects) against the environmental variables and \(\boldsymbol{\gamma}_j\) against the traits. In reality, it is more complicated, because \(\boldsymbol{z}_i\) and \(\boldsymbol{\gamma}_j\) are estimated in the model, so their uncertainty needs to be propagated.
We denote the abundance at site \(i = 1 \dots n\) for species \(j = 1 \dots p\) as \(Y_{ij}\), or as a matrix as \(\boldsymbol{Y}\). The environmental variables are \(x_{ik}\) for the \(k = 1 \ldots K\) predictors (or \(\boldsymbol{X}\) as a matrix), and \(t = 1\ldots T\) traits as \(\boldsymbol{TR}\). We then assume:
\[Y_{ij} \sim \text{NB}(\lambda_{ij}, \phi_j),\] where \(\phi_j\) is the scale parameter for each species. Also,
\[\text{log}(\lambda_{ij}) = \eta_{ij}.\]
Consequently, \(\eta_{ij}\) is the linear predictor, which we further model with the hierarchical ordination:
\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]
We then model \(\boldsymbol{z}_i\) (as in van der Veen et al. (2023)) and \(\boldsymbol{\gamma}_j\) hierarchically:
\[ \boldsymbol{z}_i = \boldsymbol{B}^\top\boldsymbol{x}_i + \boldsymbol{\epsilon}_i, \] and
\[ \boldsymbol{\gamma}_j = \boldsymbol{\omega}^\top\boldsymbol{TR}_{j} + \boldsymbol{\varepsilon}_j, \] where:
We fit the model with the Nimble R-package.
# 1LV function
inits <- function(consts){
B = rnorm(consts$NEnv)
O = rnorm(consts$NTraits)
varepsilon = rnorm(consts$NSpecies)
epsilon = rnorm(consts$NSites)
list(
BSTAR = B,
OSTAR = O,
epsilonSTAR = epsilon,
varepsilonSTAR = varepsilon,
sd.Site = rexp(1),
sd.Species = rexp(1),
beta0 = rnorm(consts$NSpecies),
sd.LV = rexp(1),
phi = rexp(consts$NSpecies)
)
}
# 2LV function
inits2 <- function(consts){
B = matrix(rnorm(consts$nLVs*consts$NEnv),ncol=consts$nLVs)
B[upper.tri(B)] = 0
O = matrix(rnorm(consts$nLVs*consts$NTraits),nrow=consts$NTraits)
O[upper.tri(O)] = 0
varepsilon = mvtnorm::rmvnorm(consts$NSpecies,rep(0,consts$nLVs),diag(consts$nLVs))
epsilon = mvtnorm::rmvnorm(consts$NSites,rep(0,consts$nLVs),diag(consts$nLVs))
list(
BSTAR = B,
OSTAR = O,
epsilonSTAR = epsilon,
varepsilonSTAR = varepsilon,
sd.Site = rexp(consts$nLVs),
sd.Species = rexp(consts$nLVs),
beta0 = rnorm(consts$NSpecies),
sd.LV = rexp(consts$nLVs),
phi = rexp(consts$NSpecies)
)
}
# Function to run a single chain
RunOneChain <- function(seed, dat, code, inits, consts, ToMonitor=NULL,
Nburn=5e3, NIter=5.5e4, Nthin=10, block = NULL, RW = NULL, ...) {
require(nimble)
require(nimbleHMC)
AllSamplers <- HMCsamplers <- c('epsilonSTAR', 'varepsilonSTAR', 'beta0', 'OSTAR', 'BSTAR',
'sd.Site', 'sd.Species', 'sd.LV','phi')
if(!is.null(block)){
HMCsamplers <- HMCsamplers[!HMCsamplers%in%unique(gsub("\\s*\\[[^\\)]+\\]","",c(unlist(block),unlist(RW))))]
}
if(is.null(ToMonitor)) {
ToMonitor <- c("beta0", "sd.Species", "sd.Site", "sd.LV", "B", "O",
"epsilon", "varepsilon", "gamma", "z", "phi")
}
mod <- nimbleModel(code = code, name = "HO", constants = consts,
inits = inits(consts), data = dat, buildDerivs = TRUE)
model <- compileNimble(mod)
# Do HMC
conf <- configureHMC(model, nodes = HMCsamplers, monitors = ToMonitor, print = FALSE,
control=list(nwarmup=Nburn, kappa = 0.95, maxTreeDepth = 30))
if(!is.null(block)) {
if(is.list(block)){
# Use a slice everything that not being HMCed
lapply(block, conf$addSampler, type = "AF_slice", sliceAdaptFactorInterval = Nburn)
}else{
# Use a slice everything that not being HMCed
sapply(block, conf$addSampler, type = "AF_slice", sliceAdaptFactorInterval = Nburn)
}
}
if(!is.null(RW)) {
if(is.list(RW)){
# Use a slice everything that not being HMCed
lapply(RW, conf$addSampler, type = "RW")
}else{
# Use a slice everything that not being HMCed
sapply(RW, conf$addSampler, type = "RW")
}
}
mcmc <- buildMCMC(conf)
cmcmc <- compileNimble(mcmc, project = model)
res <- runMCMC(cmcmc, niter=NIter, nburnin = Nburn, thin=Nthin,
nchains = 1, samplesAsCodaMCMC = TRUE, ...)
return(res)
}
# Function to run chains in parallel
ParaNimble <- function(NChains, ...) {
opts <- list(...)
if(!is.null(opts$seeds) && (length(opts$seeds) == NChains)){
seeds <- opts$seeds
opts <- opts[-which(names(opts)=="seeds")]
}else{
seeds <- 1:NChains
}
require(parallel)
nimble_cluster <- makeCluster(NChains)
samples <- parLapply(cl = nimble_cluster, X = seeds, ...)
stopCluster(nimble_cluster)
# Name the chains in the list
chains <- setNames(samples,paste0("chain", 1:length(samples)))
chains
}
# Function to create list of names for parameters to use a block slice sampler for
MakeBlockList <- function(consts, LVwise = TRUE){
# builds list for slice AF sampling
# with LVwise = TRUE we are LV-wise blocking B and epsilon, and O and varepsilon
# otherwise, LVs are jointly blocked
if(LVwise & consts$nLVs>1){
blockList <- c(sapply(1:consts$nLVs,
function(x,consts){
c(paste0("BSTAR[", 1:consts$NEnv,", ",x, "]"),
paste0("epsilonSTAR[", 1:consts$NSites,", ",x, "]"))
}
,
consts = consts,simplify=F),
sapply(1:consts$nLVs,
function(x,consts){
c(paste0("OSTAR[", 1:consts$NTraits,", ",x, "]"),
paste0("varepsilonSTAR[", 1:consts$NSpecies,", ",x, "]"))
}
,
consts = consts,simplify=F)
)
if(consts$nLVs>1){
# remove entries of B,O-STAR that are fixed
for(q in 2:consts$nLVs){
blockList[[q]] <- blockList[[q]][-(1:(q-1))]
blockList[-c(1:consts$nLVs)][[q]] <- blockList[-c(1:consts$nLVs)][[q]][-(1:(q-1))]
}
}
}else{
blockList = list(c("BSTAR","epsilonSTAR"),c("OSTAR","varepsilonSTAR"))
}
blockList
}
# Utility Function to get logical indicators for if names contains a string in v
GetInds <- function(v, names) {
if(length(v)==1) {
res <- grep(v, names)
} else {
res <- c(unlist(sapply(v, grep, x=names)))
}
res
}
# Function to swap signs of all variables varinds to have same sign as vartosign.
ReSignChain <- function(chain, varinds, vartosign) {
# MeanSign <- sign(mean(chain[ ,s])) # Might need this
res <- t(apply(chain, 1, function(x, vs, vi) {
Names <- names(x)[vi]
if(any(grepl(",", Names))) {
lvind <- gsub(".*, ", "", Names)
} else {
lvind <- seq_along(vs)
}
x[vi] <- x[vi]*sign(x[vs[lvind]])
x
}, vs=vartosign, vi=varinds))
as.mcmc(res)
}
# Function to post-process chains to swap signs.
postProcess <- function(Chains, VarsToProcess, VarsToSwapBy = NULL, VarToSign=NULL, print=FALSE) {
if(is.null(VarToSign)) VarToSign <- VarsToProcess
SignInd <- GetInds(VarToSign, colnames(Chains[[1]]))
# Get indicators for all variables to have their signs changed
ProcessInds <- GetInds(VarsToProcess, names = colnames(Chains[[1]]))
# Check if > 1 LV
SeveralLVs <- any(grepl(",", colnames(Chains[[1]])[SignInd]))
# Calculate variance of mean of indicator of sign:
# hopefully largest is variable with most sign swapping (i.e. )
Signs <- as.data.frame(lapply(Chains, function(mat, ind) {
colMeans(mat[,ind]>0)
}, ind=SignInd))
VarSign <- apply(Signs, 1, var)
if(SeveralLVs) {
LV <- gsub(".*, ", "", colnames(Chains[[1]])[SignInd])
#Chose variables who's sign will be used to swap other signs
if(is.null(VarsToSwapBy)){
VarsToSwapBy <- sapply(unique(LV), function(lv, vs) {
vv <- vs[grep(lv, names(vs))]
nm <- names(which(vv==max(vv)))
if(length(nm)>1) nm <- nm[1] # probably something more sophisticated is better
nm
}, vs = VarSign, simplify = TRUE)
}
if(print) message(paste0("Swapping by ", paste(VarsToSwapBy, collapse=", ")))
chains.sgn <- lapply(Chains, ReSignChain, vartosign=VarsToSwapBy, varinds=ProcessInds)
} else { # only 1 LV
if(is.null(VarsToSwapBy)){
#Chose variables who's sign will be used to swap other signs
VarsToSwapBy <- names(which(VarSign==max(VarSign)))[1]
}
if(print) message(paste0("Swapping by ", paste(VarsToSwapBy, collapse=", ")))
# chains.sgn <- lapply(Chains, ReSignChain, varNames=VarsToProcess, sign.var=ind)
chains.sgn <- lapply(Chains, ReSignChain, vartosign=VarsToSwapBy, varinds=ProcessInds)
}
as.mcmc.list(chains.sgn)
}
# Function to convert a variable in an MCMC chain that should be a matrix from
# a vector to the right matrix
ChainToMatrix <- function(ch, name) {
v <- ch[grep(paste0("^", name, "\\["), names(ch))]
l.row <- as.numeric(gsub(".*\\[", "", gsub(",.*", "", names(v))))
l.col <- as.numeric(gsub(".*, ", "", gsub("\\]", "", names(v))))
mat <- matrix(0, nrow=max(l.row), ncol=max(l.col))
for(i in seq_along(ch)) { # there must be a quicker way...
mat[l.row[i], l.col[i]] <- v[i]
}
mat
}
# Function to convert a variable in an MCMC chain that should be a vector to a diagonal matrix
ChainToDiag <- function(ch, name) {
v <- ch[grep(paste0("^", name, "\\["), names(ch))]
mat <- diag(v)
mat
}
# Function to rotate a latent variable, defaults to Varimax. Methods in GPArotation package
# ch: one draw from MCMC chain
# RotateBy: name of variable to rotate by (i.e. to calculate the rotation matrix for)
# SDby: name of Standard deviations
# ToRotate: Variables to rotate (e.g. epsilons, gamma/z).
# SDToRotate: Standatd devations to rotate
# rotation.fn: Function in GPArotation to calcualte rotation, or svd. Defaults to svd.
# scale: Should RotateBy be scaled by SDby before rotating? Defaults to FALSE
rotate2LV <- function(ch, RotateBy, SDby, ToRotate=NULL, SDToRotate=NULL, rotation.fn="svd",
scale=FALSE) {
require(GPArotation)
# Function to rotate matrices & return full vector or a matrix (retmat)
RotateVar <- function(name, vec, rotmat, retmat=FALSE) {
mat <- ChainToMatrix(ch=vec, name=name)
rotated <- mat%*%rotmat
if(retmat) {
res <- rotated
} else {
vec[grep(paste0("^", name, "\\["), names(vec))] <- c(rotated)
res <- vec
}
res
}
# Function to rotate (diagonal) matrices & return the diagonal
RotateSD <- function(name, vec, rotmat) {
mat <- diag(vec[grep(paste0("^", name, "\\["), names(vec))])
rotated <- mat%*%rotmat
vec[grep(paste0("^", name, "\\["), names(vec))] <- diag(rotated)
vec
}
# Extract SDs and latent variables
SDs <- ch[grep(SDby, names(ch))]
LVmat <- ChainToMatrix(ch, RotateBy)
if(scale) LVmat <- sweep(LVmat, 2, SDs, "*")
# Calculate a rotation matrix
if(rotation.fn!="svd"){
loadings.r <- do.call(rotation.fn, list(A=LVmat)) # rotate
}else{
rot <- svd(LVmat)$v
loadings.r <- list(Th=rot, loadings = LVmat%*%rot)
}
# Retrieve correct scale of LVs
if(scale){
#also need to put LVs into here
ch[grep(paste0("^", RotateBy, "\\["), names(ch))] <- scale(loadings.r$loadings)
ch[grep(paste0("^", SDby, "\\["), names(ch))] <- attr(scale(loadings.r$loadings),"scaled:scale")
} else{
# if LVs are not scaled by LV sds, we just add the new LV scale to LVs SDs.
ch[grep(paste0("^", RotateBy, "\\["), names(ch))] <- scale(loadings.r$loadings)
ch[grep(paste0("^", SDby, "\\["), names(ch))] <- attr(scale(loadings.r$loadings),"scaled:scale")*SDs
}
# Rotate rest of the variables
for(nm in unique(ToRotate)) {
ch <- RotateVar(vec=ch, name=nm, rotmat=loadings.r$Th, retmat = FALSE)
}
for(nm in unique(SDToRotate)) {
ch <- RotateSD(vec=ch, name=nm, rotmat=loadings.r$Th)
}
ch
}
RotateChains <- function(mcmc.lst, RotateBy="z", SDby = "sd.LV",
ToRotate=c("B", "epsilon", "gamma", "O", "varepsilon"),
SDToRotate=c("sd.Site", "sd.Species"), ...) {
if(SDby%in%SDToRotate)stop("'SDby' should not be present in 'SDToRotate'.")
if(RotateBy%in%ToRotate)stop("'RotateBy' should not be present in 'ToRotate'.")
rotated <- lapply(mcmc.lst, function(mcmc) {
rot <- apply(mcmc, 1, rotate2LV, RotateBy=RotateBy, SDby = SDby,
ToRotate=ToRotate, SDToRotate=SDToRotate, ...)
as.mcmc(t(rot))
})
as.mcmc.list(rotated)
}
RescaleVars <- function(vec, ScaleBy, SDTorescale, ToRescale) {
vec2 <- vec
ScBy <- ChainToMatrix(vec, ScaleBy)
SD <- apply(ScBy, 2, sd)
for(nm in unique(c(ScaleBy, ToRescale))) {
Sc.x <- ChainToMatrix(vec, nm)
Sc.x.sc <- sweep(Sc.x, 2, SD, "/")
vec2[grep(paste0("^", nm, "\\["), names(vec2))] <- c(Sc.x.sc)
}
for(nm in SDTorescale) {
SD.x <- vec[grep(paste0("^", nm, "\\["), names(vec))]
vec2[grep(paste0("^", nm, "\\["), names(vec2))] <- SD.x/SD
}
vec2
}
RescaleChains <- function(mcmc.lst, ...) {
rescale <- lapply(mcmc.lst, function(mcmc) {
rot <- apply(mcmc, 1, RescaleVars, ...)
as.mcmc(t(rot))
})
as.mcmc.list(rescale)
}
PlotPost <- function(var, summ, varnames=NULL, ...) {
vars <- grep(var, rownames(summ$statistics))
if(is.null(varnames)) varnames <- rownames(summ$statistics)[grep(var, rownames(summ$statistics))]
if(length(varnames)!=length(vars))
stop(paste0("Number of variable names, ", length(varnames), "not the same as number of variables, ",
length(vars)))
plot(summ$statistics[vars,"Mean"], 1:length(vars),
xlim=range(summ$quantiles[vars,]), yaxt="n", ...)
segments(summ$quantiles[vars,"2.5%"], 1:length(vars), summ$quantiles[vars,"97.5%"], 1:length(vars))
segments(summ$quantiles[vars,"25%"], 1:length(vars), summ$quantiles[vars,"75%"], 1:length(vars), lwd=3)
abline(v=0, lty=3)
axis(2, at=1:length(vars), labels=varnames, las=1)
}
AddArrows <- function(coords, marg= par("usr"), col=2) {
origin <- c(mean(marg[1:2]), mean(marg[3:4]))
Xlength <- sum(abs(marg[1:2]))/2
Ylength <- sum(abs(marg[3:4]))/2
ends <- coords / max(abs(coords)) * min(Xlength, Ylength) * .8
arrows(
x0 = origin[1],
y0 = origin[2],
x1 = ends[,
1] + origin[1],
y1 = ends[, 2] + origin[2],
col = col,
length = 0.1)
text(
x = origin[1] + ends[, 1] * 1.1,
y = origin[2] + ends[, 2] * 1.1,
labels = rownames(coords),
col = col)
}
# Function to get MCMC results for reduced-rank approximated interaction term
getMCMCInt <- function(mcmc.lst, dat,...) {
int.mcmc <- lapply(mcmc.lst, function(mcmc){
inCoefs <- as.mcmc(t(apply(mcmc, 1, function(ch){
B <- ChainToMatrix(ch,"B")
O <- ChainToMatrix(ch,"O")
SDs <- diag(ch[grep("sd.LV",gsub("\\s*\\[[^\\)]+\\]","",names(ch)))])
coefs <- B%*%SDs%*%t(O)
vec <- c(coefs)
names(vec) <- paste0(colnames(dat$X),":",rep(colnames(dat$TR),each=ncol(dat$X)))
vec
})))
}
)
as.mcmc.list(int.mcmc)
}
ExtractMeans <- function(mcmc.lst, name=NULL) { # this needs defensive programming
if(is.null(name)) {
ind <- 1:nrow(mcmc.lst$statistics)
} else {
ind <- grep(name, rownames(mcmc.lst$statistics))
}
Means <- mcmc.lst$statistics[ind,"Mean"]
if(any(grep(",", names(Means)))) {
Col <- gsub("]", "", unique(gsub(".*, ", "", names(Means))))
res <- sapply(Col, function(cc, mn) {
mn[grep(paste0(cc,"]"), names(mn))]
}, Means)
} else {
res <- Means
}
res
}
GetMeans <- function(summ, name, d=1) {
if(is.null(d)) d <- 1
var <- summ$statistics[grep(paste0('^', name), rownames(summ$statistics)),"Mean"]
matrix(var,ncol=d)
}
We start with a single dimension for simplicity, so that we can show the steps needed.
To fit the model we need to 1) code up the likelihood, 2) specify the Markov Chain Monte carlo samplers and initial values, 3) run it. We parallelise running the MCMC so that each chain is run on a different processor. This means we need a function to run one chain, and then another to run all of the chains together.
OneLV.nim <- nimbleCode({
for (i in 1:NSites) {
for (j in 1:NSpecies) {
# These three lines specify the model:
Y[i, j] ~ dnegbin(phi[j]/(phi[j]+lambda[i, j]), phi[j]) # Likelihood: Y follows a NB distribution
log(lambda[i, j]) <- eta[i, j] # Specify the log link function
eta[i,j] <- beta0[j] + gamma[j]*sd.LV*z[i] # linear predictor: species + LV
}
# Site scores: regression against environmental effects
# the Bs are the coefficients of the environmental effects
epsilonSTAR[i] ~ dnorm(0, sd = sd.Site) # Residual site effect
XB[i] <- inprod(X[i, 1:NEnv],BSTAR[1:NEnv])
zSTAR[i] <- XB[i] + epsilonSTAR[i]
}
for(j in 1:NSpecies) {
# Species effects: regression against trait effects
# The Os are the trait effects.
varepsilonSTAR[j] ~ dnorm(0, sd = sd.Species) # Residual
omegaTR[j] <- inprod(TR[j, 1:NTraits],OSTAR[1:NTraits])
gammaSTAR[j] <- omegaTR[j] + varepsilonSTAR[j]
beta0[j] ~ dnorm(0, sd = 1) # Species means
}
# Here we standardise z and gamma, so the both have a variance of 1.
# The standard deviation of the latent variable is sd.LV.
zmu <- mean(zSTAR[1:NSites])
gammamu<- mean(gammaSTAR[1:NSpecies])
StdDev.z <- sqrt(mean((zSTAR[1:NSites] - zmu)^2))
StdDev.gamma <- sqrt(mean((gammaSTAR[1:NSpecies] - gammamu)^2))
z[1:NSites] <- zSTAR[1:NSites]/StdDev.z
gamma[1:NSpecies] <- gammaSTAR[1:NSpecies]/StdDev.gamma
# Scale other parameters
epsilon[1:NSites] <- epsilonSTAR[1:NSites]/StdDev.z
B[1:NEnv] <- BSTAR[1:NEnv]/StdDev.z
varepsilon[1:NSpecies] <- varepsilonSTAR[1:NSpecies]/StdDev.gamma
O[1:NTraits] <- OSTAR[1:NTraits]/StdDev.gamma
# priors for scales
sd.Site ~ dexp(1)
sd.Species ~ dexp(1)
sd.LV ~ dexp(1)
for(k in 1:NEnv){
BSTAR[k] ~ dnorm(0, sd = 1)
}
for(tr in 1:NTraits){
OSTAR[tr] ~ dnorm(0, sd = 1)
}
for(j in 1:NSpecies){
phi[j] ~ dexp(1)
}
})
Latent variable models are notorious for being unidentifiable, you can get the same mean abundances from different combinations of the parameters. We have to make some adjustments to the model to account for this: some of this is done in the model fitting, but for others it is easier to do it after we obtain the posterior samples.
Finally, we can run the MCMC. We use a block slice sampler for the predictor effects and residual terms, a random walk sampler for the intercepts and dispersion parameters, and Hamiltonian Monte Carlo for the remaining scale parameters.
consts$nLVs = 1
chains.unsgn <- ParaNimble(4, fun = RunOneChain,
dat = dat,
code = OneLV.nim,
inits = inits,
# Nburn = 5e1, NIter = 5.5e2, Nthin = 1, # for a small run
Nburn = 1e3, NIter = 2e3, Nthin = 1, # for a big HMC run
consts = consts,
block = MakeBlockList(consts, LVwise = TRUE), RW = c(paste0("beta0[",1:consts$NSpecies,"]"),paste0("phi[",1:consts$NSpecies,"]"))) # HMC on the rest (hypers)
# post-process chains for sign-swapping
VarsToProcess <- c("^B", "^O", "^epsilon", "^varepsilon", "^z", "^gamma")
VarsToSign <- c("^B")
chains <- postProcess(chains.unsgn, VarsToProcess = VarsToProcess,
VarToSign = VarsToSign, print = TRUE, VarsToSwapBy = "B[26]")
summ.HO = summary(chains)
There are a lot of parameters, so a lot of results to look at. Here we concentrate on \(\boldsymbol{B}\), \(\boldsymbol{\omega}\), \(\boldsymbol{\sigma}\) and \(\boldsymbol{\delta}\), and a latent variable-plot.
First the environmental effects:
library(basicMCMCplots)
chainsPlot(chains, var = "B", legend = F)
Then the trait effects:
chainsPlot(chains, var = "O", legend = F)
These look quite OK (after post-processing the signs). Now we can create a plot of the site scores against their indices to inspect the results.
par(mar=c(4.1, 7, 1, 1))
PlotPost(var = "B", summ = summ.HO, varnames = gsub("\\.", " ", colnames(X)),
xlab = "Environmental Effect", ylab = "")
We can see that all effects are quite uncertain, but that management intensity, elevation, and pH have non-zero effects effects on the latent variable. Management has a positive coefficient, elevation negative, and pH positive so beetles that have a positive species score will be more abundant in areas with more intense management, lower elevation, and more basic pH; i.e., lowlands.
We can also look at the trait effects. The bottom line is that there is too much uncertainty to say much, but LTL (beetle length) has a non-zero negative effect.
par(mar=c(4.1, 7, 1, 1))
PlotPost(var = "O", summ = summ.HO, varnames = gsub("\\.", "\n", colnames(TR)),
xlab = "Trait Effect", ylab = "")
We can also calculate the full distributions of site and species scores, and plot them. We see that, well, there is variation, so the trait effects are not uncertain because the species scores are uncertain: it is because their effects are small.
par(mfrow = c(2, 1), mar = c(2, 12, 4, 1))
PlotPost("^z", summ = summ.HO, varnames = NULL, ylab = "", xlab = "Latent variable 1", main = "Sites")
PlotPost("^gamma", summ = summ.HO, varnames = NULL, ylab = "", xlab = "Latent variable 1", main = "Species")
More than one dimension requires adding extra identifiability constraints. Now that we have two or more latent variables, we also have to worry about their rotation.
There are various ways to place the constraints, some more numerically stable than others. Generally, we note that the model on the link scale is similar to existing matrix decompositions, so that much about the required constraints can be learned from those.
To prevent rotational invariance, we require both \(\boldsymbol{B}\) and \(\boldsymbol{O}\) to have zeros above the main diagonal. van der Veen et al (2023) assumed \(\boldsymbol{B}\) to be a semi-orthogonal matrix instead, as they encountered numerical instability with the constraints that we impose here. However, their constraint would require sampling on constrained parameter spaces, which is difficult, while this constraint formulation allows to use standard out-of-the-box MCMC samplers (also, it seems to work fine here). Together with the scale constraint for the latent variables there are \(d(d-1)-2K\) constraints in the model, and when \(d = K\) the number of parameters is the same in the interaction terms as for the fourth corner model by Niku et al.(2021). When \(d \gt min(K,T)\) the model can still be fitted (but either \(\boldsymbol{B}\) or \(\boldsymbol{O}\) needs to be padded with a column of zeros), but with \(d \gt max(d,K)\) there are no further predictor effects to be estimated (as the model includes the same number of parameters as the full rank term) and additional constraints will need to be introduced.# Model code
# Update our constants from before with the new number of LVs, rest remains the same
HO.nim <- nimbleCode({
for (i in 1:NSites) {
for (j in 1:NSpecies) {
eta[i,j] <- beta0[j] + sum(gamma[j,1:nLVs]*sd.LV[1:nLVs]*z[i,1:nLVs])
log(lambda[i, j]) <- eta[i, j]
Y[i, j] ~ dnegbin(phi[j]/(phi[j]+lambda[i, j]), phi[j])
}
for (l in 1:nLVs) {
XB[i, l] <- sum(X[i, 1:NEnv]*BSTAR[1:NEnv, l])
epsilonSTAR[i,l] ~ dnorm(0,sd.Site[l])# Residual
zSTAR[i,l] <- XB[i,l] + epsilonSTAR[i,l]
}
}
for(j in 1:NSpecies) {
for (l in 1:nLVs) {
omegaTR[j, l] <- sum(TR[j, 1:NTraits]*OSTAR[1:NTraits, l])
varepsilonSTAR[j,l] ~ dnorm(0,sd.Species[l]) # Residual
gammaSTAR[j,l] <- omegaTR[j,l] + varepsilonSTAR[j,l]
}
beta0[j] ~ dnorm(0, sd=1)
}
# Constraints to 0 on upper diagonal
# stole some code from Boral for this - thanks Francis
for(l in 1:(nLVs-1)) {
for(ll in (l+1):(nLVs)) {
BSTAR[l,ll] <- 0
OSTAR[l,ll] <- 0
}
}
for(l in 1:nLVs) {
# diagonal elements
BSTAR[l,l] ~ dnorm(0,sd=1)
OSTAR[l,l] ~ dnorm(0,sd=1)
## standardizing z and gamma
zmu[l] <- mean(zSTAR[1:NSites,l])
StdDev.z[l] <- sqrt(mean((zSTAR[1:NSites,l] - zmu[l])^2))
z[1:NSites,l] <- (zSTAR[1:NSites,l]-zmu[l])/StdDev.z[l]
gammamu[l] <- mean(gammaSTAR[1:NSpecies,l])
StdDev.gamma[l] <- sqrt(mean((gammaSTAR[1:NSpecies,l] - gammamu[l])^2))
gamma[1:NSpecies,l] <- (gammaSTAR[1:NSpecies,l]-gammamu[l])/StdDev.gamma[l]
# Scale other parameters
epsmu[l] <- mean(epsilonSTAR[1:NSites,l])
varepsmu[l] <- mean(varepsilonSTAR[1:NSpecies,l])
epsilon[1:NSites,l] <- (epsilonSTAR[1:NSites,l]-epsmu[l])/StdDev.z[l]
B[1:NEnv,l] <- BSTAR[1:NEnv,l]/StdDev.z[l]
varepsilon[1:NSpecies,l] <- (varepsilonSTAR[1:NSpecies,l]-varepsmu[l])/StdDev.gamma[l]
O[1:NTraits,l] <- OSTAR[1:NTraits,l]/StdDev.gamma[l]
# priors for scales
sd.Site[l] ~ dexp(1)
sd.Species[l] ~ dexp(1)
sd.LV[l] ~ dexp(1)
}
## Free lower diagonals
for(l in 2:nLVs) {
for(ll in 1:(l-1)) {
BSTAR[l,ll] ~ dnorm(0,sd=1)
OSTAR[l,ll] ~ dnorm(0,sd=1)
}
}
## All other elements
for(l in (nLVs+1):NEnv) {
for(ll in 1:(nLVs)) {
BSTAR[l,ll] ~dnorm(0,1)
}
}
for(l in (nLVs+1):NTraits) {
for(ll in 1:(nLVs)) {
OSTAR[l,ll] ~dnorm(0, 1)
}
}
for(j in 1:NSpecies){
phi[j] ~ dexp(1)
}
})
Our model is:
\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]
There are many possible rotations (see the https://cran.r-project.org/web/packages/GPArotation/index.html package for a long list). Here we will rotate based on a singular value decomposition (SVD) of the latent variables, as in the gllvm R-package Niku et al.. This rotates the ordination so that most of the variation in \(\boldsymbol{z}_i^\top \boldsymbol{\Sigma}\) is put on the first latent variable, and so that the second latent variable has the next largest amount of variance. Note, that this does not guarantee that these are the latent variables that explain most variation in the response, unlike in eigenanalysis.
In detail, for each draw from the posterior we:
The first step sweeps the scale of the LVs and loadings into the LV-specific scale so it still represents the total variance of the latent variables. <>
We can run the fitting with the same code as before:
consts$nLVs <- nLVs <- 2
HO.2LV <- ParaNimble(4, fun = RunOneChain,
dat = dat,
code = HO.nim,
inits = inits2,
# Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
Nburn = 1e3, NIter = 5e3, Nthin = 1, # for a full run
consts = consts,
block = MakeBlockList(consts, LVwise=TRUE), RW = c(paste0("beta0[",1:consts$NSpecies,"]"),paste0("phi[",1:consts$NSpecies,"]"))) # HMC on the rest (hypers))
and do the post-processing with functions previously defined:
# Re-scale gamma so that its scale is absorbed by LVsd
chains2LV.sc <- RescaleChains(HO.2LV, ScaleBy = "gamma",
SDTorescale = c("sd.Species"),
ToRescale = c("gamma", "O", "varepsilon"))
# Re-scale Z so that its scale is absorbed by LVsd
chains2LV.sc2 <- RescaleChains(chains2LV.sc, ScaleBy = "z",
SDTorescale = c("sd.Site"),
ToRescale = c("z", "B", "epsilon"))
# Now rotate based on Z*Sigma and return its scale as LVsd.
chains2LV.r <- RotateChains(chains2LV.sc2, rotation.fn = "svd", scale = T)
# post-process chains for sign-swapping
chains2LV <- postProcess(chains2LV.r, VarsToProcess = VarsToProcess, VarsToSwapBy = c("1]" = "B[26, 1]", "2]" = "B[22, 2]"), print = F)
# Calculate summaries
summ.HO2LV <- summary(chains2LV)
rm(chains2LV.sc, chains2LV.sc2, chains2LV.r)
Environment first:
chainsPlot(chains2LV, var = "B", legend = F, ncols = 2)
And the trait effects:
chainsPlot(chains2LV, var = "O", legend = F, ncols = 2)
The LV-specific standard deviations gives an impression of relative importance of the latent variables;
SDs <- summ.HO2LV$statistics[grep("sd", rownames(summ.HO2LV$statistic)),]
rownames(SDs) <- c(paste0("Latent Variable ", 1:2),
paste0("Site ", 1:2),
paste0("Species ", 1:2))
knitr::kable(SDs, digits=2, format = "html", table.attr = "style='width:70%;'")
| Mean | SD | Naive SE | Time-series SE | |
|---|---|---|---|---|
| Latent Variable 1 | 1.86 | 0.10 | 0 | 0.00 |
| Latent Variable 2 | 0.94 | 0.07 | 0 | 0.00 |
| Site 1 | -0.03 | 0.18 | 0 | 0.02 |
| Site 2 | 0.02 | 0.13 | 0 | 0.01 |
| Species 1 | 0.00 | 0.01 | 0 | 0.00 |
| Species 2 | 0.00 | 0.02 | 0 | 0.00 |
we see that the first latent variable explains about twice as much variation than the second. We can also present this as a ratio:
Ratio <- as.mcmc.list(lapply(chains2LV,function(ch)as.mcmc(data.frame(ch=ch[,"sd.LV[2]"]/ch[,"sd.LV[1]"]))))
chainsPlot(Ratio, legend=F)
Again we examine the predictor effects in a caterpillar plot to spot non-zero predictor effects:
par(mar=c(5, 10, 4, 2) + 0.1)
PlotPost("B",summ.HO2LV,varnames = paste(rep(colnames(X),2), rep(paste0("LV.",1:2),each=ncol(X)), sep = "-"), xlab="Predictor effect", ylab=NA)
PlotPost("O",summ.HO2LV,varnames = paste(rep(colnames(TR),2), rep(paste0("LV.",1:2),each=ncol(TR)), sep = "-"), xlab="Predictor effect", ylab=NA)
From which we see that there are various non-zero effects for the environmental effects, but that generally the trait effects are still too uncertain to draw any conclusions.
However, What we are really interested in, is looking at the environment-trait interactions. We can plot those with the following code.
intChains <- getMCMCInt(chains2LV, dat = dat)
summ.EnvTraitInt <- summary(intChains)
coefs <- matrix(summ.EnvTraitInt$statistics[,1], ncol = ncol(dat$TR), nrow=ncol(dat$X), dimnames = list(colnames(dat$X),colnames(dat$TR)))
# This plotting code was retrieved from the gllvm vignette: https://cran.r-project.org/web/packages/gllvm/vignettes/vignette1.html
a <- 0.2
colort <- colorRampPalette(c("blue", "white", "red"))
levelplot(coefs, xlab = "Environmental Variables",
ylab = "Species traits", col.regions = colort(100), cex.lab = 1.3,
at = seq(-a, a, length = 100), scales = list(x = list(rot = 45)))
From which we can see that Moisture, elevation, and management have the most pronounced effects; the variables that are included in the analysis of this dataset by Niku et al (2021).
Now, we can make two-dimensional ordination plots of sites and species, with their predictor effects. Note that these have been rotated, so that most ofthe variance should be on the first axis.
z.m <- GetMeans(summ.HO2LV, name="^z", d=consts$nLVs)
gamma.m <- GetMeans(summ.HO2LV, name="^gamma", d=consts$nLVs)
vareps.m <- GetMeans(summ.HO2LV, name="^varepsilon", d=consts$nLVs)
eps.m <- GetMeans(summ.HO2LV, name="^epsilon", d=consts$nLVs)
B.m <- GetMeans(summ.HO2LV, name="B", d=consts$nLVs)
row.names(B.m) <- colnames(dat$X)
O.m <- GetMeans(summ.HO2LV, name="O", d=consts$nLVs)
row.names(O.m) <- colnames(dat$TR)
par(mfrow=c(2,1))
#LVs
plot(z.m,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Sites")
text(z.m,labels=1:consts$NSites)
AddArrows(marg = par("usr"), coords=B.m, col="red")
#gammas
plot(gamma.m,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Species")
text(gamma.m,labels=vegan::make.cepnames(colnames(Y)))
AddArrows(marg = par("usr"), coords=O.m, col="blue")
We can see that almost all of the environmental effects seem to act in the same way, mainly on LV1.
The covariance of species is determined by traits, LV-specific variation and sites’ residual variation, and the covariance between sites is determined by the environment, LV-specific variation and species’ residual variation.
The residual covariance matrix of the model (where you calculate species associations from) is determined by three terms:
where \(\mathcal{K}_0\) denotes the zero order modified Bessel function of the second kind. The first term tells us that correlations between species are determined by the environment. The second term tells us that the correlation between sites is determined by species traits. The last term induces no correlation between sites and species for diagonal \(\boldsymbol{\Sigma}\), but serves to scale species associations. Consequently, the covariance for species \(j,j2\) and site \(i,i2\) is: \[\begin{multline} \Sigma_{i,i2,j,j2} = \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \\ \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j},\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2})+ \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}), \end{multline}\] third order terms are zero for central normal random variables, so this simplifies to: \[\begin{equation} \Sigma_{i,i2,j,j2} = \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2}+ \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2})\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}). \end{equation}\] Here, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}) = \text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2}) = 0\), and for \(\text{cov}(\boldsymbol{\epsilon}_i^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) = 2\text{tr}(\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma})\). Further, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2})\) is zero for \(j \neq j2\) and \(\text{diag}(\boldsymbol{\delta}^2)\) otherwise, and similar for \(\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2})\) .
Consequently, for a species association matrix where we consider the block of the covariance matrix where \(i = i2\), or for the site-to-site matrix where we consider the block \(j = j2\), we have:
\[\begin{split} \Sigma_{j,j2} &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}) + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{var}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} \\ &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2}+ 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}, \end{split}\]and
\[\begin{equation} \Sigma_{i,i2} = \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j} + \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}. \end{equation}\]We can visualize those matrices here for (e.g.,) the first site and first species, respectively.
#species
spec.mat <- matrix(0,nrow=consts$NSpecies,ncol=consts$NSpecies)
Sigma <- diag(SDs[grep("Latent Variable", rownames(SDs)),1]^2)
Sigma.sp <- diag(SDs[grep("Species", rownames(SDs)),1]^2)
Sigma.si <- diag(SDs[grep("Site", rownames(SDs)),1]^2)
for(j in 1:consts$NSpecies){
for(j2 in 1:consts$NSpecies){
if(j==j2){
spec.mat[j,j2] = X[1,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%t(B.m)%*%t(X[1,,drop=F])
}
spec.mat[j,j2] = spec.mat[j,j2] + TR[j,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%Sigma%*%t(O.m)%*%t(TR[j2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
}
}
spec.cor.mat <- cov2cor(spec.mat)
colnames(spec.cor.mat) <- row.names(spec.cor.mat) <- colnames(Y)
corrplot::corrplot(spec.cor.mat,type = "lower",order = "AOE", main = "Species", mar = c(1,1,1,1),tl.srt=45,tl.cex = .5)
#sites
site.mat <- matrix(0,nrow=consts$NSites,ncol=consts$NSites)
for(i in 1:consts$NSites){
for(i2 in 1:consts$NSites){
if(i==i2){
site.mat[i,i2] = TR[1,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%t(O.m)%*%t(TR[1,,drop=F])
}
site.mat[i,i2] = site.mat[i,i2] + X[i,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%Sigma%*%t(B.m)%*%t(X[i2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
}
}
site.cor.mat <- cov2cor(site.mat)
colnames(site.cor.mat) <- row.names(site.cor.mat) <- paste("Site", 1:consts$NSites)
corrplot::corrplot(site.cor.mat,type = "lower",order = "AOE", main = "Sites", mar = c(3,3,3,3),tl.srt=45,tl.cex = .5)